Gaining insight on a University Student Population Using Network Access Data

University of New Hampshire ECE 824 Final Project

Authors: Colin Cambo, Austin Smith

NOTE: THIS DOCUMENT IS AN INTERACTIVE JUPYTER NOTEBOOK, IT CONTAINS ALL CODE NECESSARY TO BE ABLE TO REPRODUCE OUR RESULTS. THROUGHOUT THIS PAPER YOU WILL SEE CODE EXAMPLES FOR HOW EACH STEP WAS EXECUTED, AND AN EXPLANATION OF THE CODE IN ITALICS.

For more information on Jupyter Notebooks please check <a href = 'http://jupyter.org/about.html'>here.</a>

Notebook Contents

0 Introduction

<a id = '0'></a>

The movement towards creating smart cities and smart campuses has been a growing trend globally and an interesting body of research within the Ubiquitous Computing field of research. The idea being that managing information systems and installing smart technology can help solve problems or areas which need improving on a college campus or in a city. Some of these issues include:

"How can we create a safe environment for students?" "Where are students spending the most time and how can access to those spaces be managed more effectively?" "Where can more WIFI access points be added to improve internet access to students?" "Where and when will students be most likely to see a mass advertising or important message from the university?"

The list of improvements university administrators would like to make is immense and the idea of smart campus technology is to use networked technology in the background to help streamline and improve the efficiency of the campus. The problem with the smart campus concept is that adding in these technologies and systems is complex and costly for the university. Especially given how much utility such systems may provide. Often times when looking at the cost benefit analysis of many of these technologies, while the results are impressive, the systems are not worth the cost.

In response to this issue, we considered the idea that maybe our university, (University of New Hampshire) may be able to make use of the existing technology infrastructure to get an idea of how students travel around campus. The idea being that tracking connection times to various access points on campus would allow the university to have a system for following student movement patterns and understanding how the student body uses the campus as a whole.

0.1 Requirements

<a id = '0.1'></a> If you have Python installed through Anaconda you will have all the necessary packages except Basemap.

Here is a link to how to install Basemap with pip.

In [1]:
import pandas as pd
from datetime import datetime
from collections import defaultdict, Counter
import matplotlib.pyplot as plt
import pandas as pd
from datetime import datetime
from collections import defaultdict, Counter
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.patches as mpatches
from ipywidgets import widgets
from IPython.display import display, clear_output
from ipywidgets import interact, interactive, fixed
from matplotlib import colors as colors_
from mpl_toolkits.basemap import Basemap

%matplotlib inline

1 Data

<a id = '1'></a>

1.1 University of New Hampshire Wifi Data

<a id = '1.1'></a>

Through working with the University of New Hampshire Information Technology department, we were able to secure data for a weeks worth of historical data. This dataset contained data detailing every single connection to the university wireless network in that time frame. The following variables were included:

  • 'MAC_Address': This is the unique MAC address of the device which made the connection to the network. Every wireless chip has a unique MAC address associated with it.
  • 'Time': This variable contained the date and time of connection in the following format, '2015-09-19 23:59:59'
  • 'Access_Point': This variable contatained the unique identifier of the exact access point in which the connection was made
  • 'Description': Finally, this variable shows the description of where the access point is in which the connection was made, for example, 'Gables Building C Room 605C'

The data we were given was split up into 14 seperate csv's. For each day there was a csv for residential building connections and a csv for all other buildings.

Randomizing Data
In order to keep this data secure and anonymous we decided to alter some of the data. We replaced the week's date we were given with a random week so that no one could find their own connections, we made sure to make the weekday's line up though so general insights can still be gathered. We also decided to replace the last 6 hex digits of everyone's MAC Addresses with an assigned number so that they don't represent people's actual MAC Addresses. We chose to assign a number between 0x000000 - 0xFFFFFF instead of hashing because hashing the last six digits grew the size of the string enormously and we believe this serves the same function of keeping the users MAC Addresses anonymous. The last thing we did to the data was remove the access point description's so now you'll only know when someone connects to a building not an individual access point. This helped us to save space and provides another layer of security so that it's very difficult to know what access point is in each room.

The code below reads in two sample datasets

In [2]:
#Converters for standardizing data being read in
convert_time = lambda x: x.replace('+00', '')
convert_mac = lambda x: x.replace(':', '').upper()

auth_df = pd.read_csv(r'.\data\WiFi_Data\randomized_auth_2015-09-19.csv',names=['MAC_Address', 'Time', 'Access_Point'], 
                 converters={'Time':convert_time, 'MAC_Address':convert_mac})
xt_auth_df = pd.read_csv(r'.\data\WiFi_Data\randomized_xt_auth_2015-09-19.csv',names=['MAC_Address', 'Time', 'Access_Point'], 
                 converters={'Time':convert_time, 'MAC_Address':convert_mac})

For illustrative purposes, the data shown below is the first ten rows of the data from both datasets. The next steps were to work with these variables and create a few additional variables to make the data more useful.

In [3]:
print('This is the first ten rows of the residential data set!')
print(auth_df.head(10))
print('Length: ', len(auth_df))
This is the first ten rows of the residential data set!
    MAC_Address                 Time Access_Point
0  00199D000000  2015-09-19 23:59:59      GBCAP63
1  38CADA000001  2015-09-19 23:59:59       STAP86
2  DC415F000002  2015-09-19 23:59:59      WSMAP08
3  AC5F3E000003  2015-09-19 23:59:59      STAP113
4  4C6641000004  2015-09-19 23:59:59      SRBAP24
5  68DBCA000005  2015-09-19 23:59:59       NDAP07
6  38CADA000001  2015-09-19 23:59:59       STAP86
7  0056CD000006  2015-09-19 23:59:59       AXAP40
8  78D75F000007  2015-09-19 23:59:59       GSAP03
9  A08869000008  2015-09-19 23:59:59       MSAP76
Length:  511826
In [4]:
print('This is the first ten rows of the non-residenial data set!')
print(xt_auth_df.head(10))
print('Length: ', len(xt_auth_df))
This is the first ten rows of the non-residenial data set!
    MAC_Address                 Time Access_Point
0  F41BA1000739  2015-09-19 23:59:59      PCAP122
1  90B686003764  2015-09-19 23:59:59       NMAP08
2  D02598000F69  2015-09-19 23:59:59       MBAP06
3  544E900016E1  2015-09-19 23:59:59       NHAP19
4  BC6C2100049D  2015-09-19 23:59:59       PBAP04
5  7831C1002B90  2015-09-19 23:59:58       PKAP01
6  0056CD002F68  2015-09-19 23:59:58       PKAP04
7  2C0E3D000EFA  2015-09-19 23:59:58       PBAP04
8  6C72E7000811  2015-09-19 23:59:58       KBAP36
9  5055270039F8  2015-09-19 23:59:58       KBAP61
Length:  279656

We decided to merge the data into a single DataFrame and add some new columns to make it easier for us to use. Below is the code that merges the 14 files into a single DataFrame and adds three new columns.

Three new colums as follows:

  • 'Hours': This column is the hour of the connection time taken out of the main 'Time' column, this allows for a bit easier filtering by date/time
  • 'Minutes':This column is the minute of the connection time taken out of the main 'Time' column, this allows for a bit easier filtering by date/time
  • 'Weekday':This column is the day of the connection time (0-Monday, 6-Sunday) taken out of the main 'Time' column, this allows for a bit easier filtering by date/time.

    NOTE: Running the code below might take awhile.

In [5]:
wifi_df = pd.DataFrame() # Initialize empty DataFrame
for i in range(19,26):
    
    auth_df = pd.read_csv(r'.\Data\WiFi_Data\randomized_auth_2015-09-'+str(i)+'.csv', 
                          names=['MAC_Address', 'Time', 'Access_Point'], 
                          converters={'Time':convert_time, 'MAC_Address':convert_mac})
    xt_auth_df = pd.read_csv(r'.\Data\WiFi_Data\randomized_xt_auth_2015-09-'+str(i)+'.csv', 
                             names=['MAC_Address', 'Time', 'Access_Point'], 
                             converters={'MAC_Address':convert_mac})
    
    #Concatenating DataFrame's
    wifi_df = pd.concat([wifi_df, auth_df, xt_auth_df])   

    #Creating datetime columns
    datetimes = [(datetime.strptime(str(t), '%Y-%m-%d %H:%M:%S')) for t in wifi_df['Time'].tolist()]
    hours = [t.hour for t in datetimes]
    day_of_week = [t.weekday() for t in datetimes]
    minutes = [t.minute for t in datetimes]

    wifi_df['Hours'] = hours
    wifi_df['Weekday'] = day_of_week
    wifi_df['Minutes'] = minutes
In [6]:
wifi_df.head()
Out[6]:
Access_Point Hours MAC_Address Minutes Time Weekday
0 GBCAP63 23 00199D000000 59 2015-09-19 23:59:59 5
1 STAP86 23 38CADA000001 59 2015-09-19 23:59:59 5
2 WSMAP08 23 DC415F000002 59 2015-09-19 23:59:59 5
3 STAP113 23 AC5F3E000003 59 2015-09-19 23:59:59 5
4 SRBAP24 23 4C6641000004 59 2015-09-19 23:59:59 5

1.2 Company MAC Address Data

<a id = '1.2'></a> Since we were given MAC Addresses with our dataset we decided it would be beneficial to match up each MAC Address with the corresponding company it belongs to so that we can better guess what the device happens to be. We were able to locate a nicely formatted csv of MAC Addresses and their company from someones github account (https://github.com/TakahikoKawasaki/nv-oui/blob/master/data/oui.csv). With this new file we decided to append the MAC Address Company.

  • 'MAC_Company': The fist six digits of a MAC Address indicates the name of the manufacturer of the device. We added this column to help us distinguish what sorts of devices we have on the network and to allow us to exclude irrelevent MAC addresses.
In [7]:
df = pd.read_csv(r'.\Data\oui.csv')#Reading in MAC Address Company csv as DataFrame
oui_dict = {i:df.ix[c,2] for c, i in enumerate(df.ix[:,1].values)}#Converting DataFrame above to dictionary for O(1) lookup time

def find_device(mac):
    """
    Returns company registered to MAC Address
    
    Keyword Argument:
    mac -- MAC Address Prefix (6 Digit Hexadecimal string)
    """
    try:
        return oui_dict[mac]
    except:
        return 'UNKNOWN'

The code below runs the find_device function for every MAC Address in our dataset and attaches the companies to our dataset.

In [8]:
company = [find_device(str(r)[:6]) for r in wifi_df['MAC_Address'].tolist()]
wifi_df['MAC_Company'] = company

Below is the first five rows of the data with the new columns added to it.

In [9]:
wifi_df.head()
Out[9]:
Access_Point Hours MAC_Address Minutes Time Weekday MAC_Company
0 GBCAP63 23 00199D000000 59 2015-09-19 23:59:59 5 VIZIO, Inc.
1 STAP86 23 38CADA000001 59 2015-09-19 23:59:59 5 Apple, Inc.
2 WSMAP08 23 DC415F000002 59 2015-09-19 23:59:59 5 Apple, Inc.
3 STAP113 23 AC5F3E000003 59 2015-09-19 23:59:59 5 SAMSUNG ELECTRO-MECHANICS(THAILAND)
4 SRBAP24 23 4C6641000004 59 2015-09-19 23:59:59 5 UNKNOWN
In [10]:
#Makes it easy to convert numbers back to weekday names
weekday_dict = {0:'Monday', 1:'Tuesday', 2:'Wednesday', 3:'Thursday', 4:'Friday', 5:'Saturday', 6:'Sunday'}
In [11]:
unique_mac_cnt = Counter(wifi_df.drop_duplicates(['MAC_Address'])['MAC_Company'].tolist())#Counts # of unique company addresses
top_10_companies = sorted(list(unique_mac_cnt.items()), key= lambda x: x[1], reverse=True)[:10]#Converts top ten to list

Plotting a simple bar graph of the unique MAC Addresses for each company reveals to us that Apple is the overwhelming device of choice on campus.

In [12]:
plt.bar(range(10), [x[1] for x in top_10_companies])
plt.xticks(range(10), [x[0] for x in top_10_companies], rotation=90)
plt.xlim([0,10])
plt.title('Unique MAC Addresses Per Company On Campus')
plt.ylabel('Number of Unique MAC Addresses')
plt.xlabel('Company')
Out[12]:
<matplotlib.text.Text at 0x15326978>

As you can see from the bar graph, Apple comprises 25,000+ unique devices while the next largest company had around 4,000. Upon discovering this, we made the decision to focus solely on Apple devices as they are more likely to be portable devices such as phones or laptops. While there may be some other devices like Apple TV's and desktop computers, we felt as though focusing solely on apple devices was the best way to capture students movements without too much noise from network devices which were likely not traveling with students.

The code below filters the DataFrame for only rows that have 'MAC_Company' equal to 'Apple, Inc.'

In [13]:
wifi_df = wifi_df[wifi_df['MAC_Company']=='Apple, Inc.'].sort_values(by=['Time'])#Selecting only Apple devices
wifi_df = wifi_df.reset_index(drop=True)#Resetting index

1.3 Access Point Locations

<a id = '1.3'></a>

The University of New Hampshire IT department provided us with the csv "Access_Locations.csv" which contains the access point number and the building that access point happens to be in. In order for us to work with the data more easily we added a column to our dataset that consists of which building each connection takes place in.

The code below reads in the "Access_Locations.csv" dataset as a DataFrame then iterates through the DataFrame adding the access points and their corresponding building to a dictionary.The dictionary is then used to generate a new column "Building" from every row in the original wifi_df dataset.

In [14]:
aploc = pd.read_csv(r'.\data\Access_Locations.csv', header=0)
building_dict = defaultdict(lambda: 'Unknown')
for row in aploc.iterrows():
    building_dict[row[1]['access_point']] = row[1]['building']
    
buildings_add = [building_dict[key] for key in wifi_df['Access_Point'].tolist()]
wifi_df['Building'] = buildings_add
wifi_df = wifi_df[wifi_df['Building']!='Unknown'].reset_index(drop=True)

We noticed that when tracking an individual user, we would get a dataset showing the following:

In [16]:
mac_track_df = wifi_df[wifi_df.MAC_Address=='38CADA000001'].reset_index(drop=True)
mac_track_df.head(10)
Out[16]:
Access_Point Hours MAC_Address Minutes Time Weekday MAC_Company Building
0 BTAP12 18 38CADA000001 26 2015-09-19 18:26:36 5 Apple, Inc. Barton
1 CTAP01 18 38CADA000001 32 2015-09-19 18:32:37 5 Apple, Inc. Craft
2 STAP78 20 38CADA000001 18 2015-09-19 20:18:20 5 Apple, Inc. Stoke
3 STAP78 20 38CADA000001 23 2015-09-19 20:23:56 5 Apple, Inc. Stoke
4 SCAP02 22 38CADA000001 30 2015-09-19 22:30:29 5 Apple, Inc. Scott
5 JDAP09 22 38CADA000001 32 2015-09-19 22:32:36 5 Apple, Inc. Jessie Doe
6 JDAP16 22 38CADA000001 33 2015-09-19 22:33:59 5 Apple, Inc. Jessie Doe
7 JDAP16 22 38CADA000001 34 2015-09-19 22:34:06 5 Apple, Inc. Jessie Doe
8 JDAP16 22 38CADA000001 34 2015-09-19 22:34:06 5 Apple, Inc. Jessie Doe
9 JDAP16 22 38CADA000001 34 2015-09-19 22:34:51 5 Apple, Inc. Jessie Doe

This shows us that a majority of connections are just from someone switching access points in the same building, which doesn't give us very much information on their movements on a macro level.

1.4 Unique Building Paths

<a id = '1.4'></a>

To solve the problem of repeating building rows, we chose to eliminate the movements within a building, since we are mostly concerned with how students are traveling around campus. We eliminated the inter-building movements by only keeping the first connection in each building.

The section of data shown below illustrates what the dataset looks like after focusing only on unique connections. We removed the non-informative data points and added a variable called 'Time_Since_Last_Connect' which is the time in minutes since the MAC Address connected to their last building. This allows us to have an idea of how many hours elapsed between connections from one building to the next.

In [17]:
mac_list = wifi_df['MAC_Address'].tolist()
building_list = wifi_df['Building'].tolist()
unique_checker = [0]*len(wifi_df)

mac_dict = defaultdict(lambda: 'None')
for i, user in enumerate(mac_list):
    if mac_dict[user]=='None':
        mac_dict[user] = building_list[i]
        unique_checker[i] = 1
    elif mac_dict[user]!=building_list[i]:
        unique_checker[i] = 1
        mac_dict[user] = building_list[i]
wifi_df['Check'] = unique_checker
wifi_df = wifi_df[(wifi_df.Check==1) & (wifi_df.Building!='0')].reset_index(drop=True)
wifi_df = wifi_df.drop('Check', 1)

last_time_dict = {}
last_time_list = [0]*len(wifi_df)
time = wifi_df['Time'].tolist()

for i, row in enumerate(wifi_df['MAC_Address'].tolist()):
    if row not in last_time_dict:
        last_time_list[i] = 0
        last_time_dict[row] = time[i]
    else:
        last_time_list[i] = int((datetime.strptime(time[i],
                                    '%Y-%m-%d %H:%M:%S') - datetime.strptime(last_time_dict[row],
                                                                            '%Y-%m-%d %H:%M:%S')).total_seconds()/60)
        last_time_dict[row] = time[i]
wifi_df['Time_Since_Last_Connect'] = last_time_list
wifi_df = wifi_df.dropna().reset_index(drop=True)
In [18]:
mac_track_df = wifi_df[wifi_df.MAC_Address=='38CADA000001'].reset_index(drop=True)
In [19]:
mac_track_df.head()
Out[19]:
Access_Point Hours MAC_Address Minutes Time Weekday MAC_Company Building Time_Since_Last_Connect
0 BTAP12 18 38CADA000001 26 2015-09-19 18:26:36 5 Apple, Inc. Barton 0
1 CTAP01 18 38CADA000001 32 2015-09-19 18:32:37 5 Apple, Inc. Craft 6
2 STAP78 20 38CADA000001 18 2015-09-19 20:18:20 5 Apple, Inc. Stoke 105
3 SCAP02 22 38CADA000001 30 2015-09-19 22:30:29 5 Apple, Inc. Scott 132
4 JDAP09 22 38CADA000001 32 2015-09-19 22:32:36 5 Apple, Inc. Jessie Doe 2

1.5 UNH Building Locations

In order to be able to plot the buildings and use them in visualizations, we needed to create a table containing coordinates in latitude an longitude. We used satelite imagery of the campus and a map to manually create this table. The first five rows of which are shown below.

In [20]:
unh_buildings = pd.read_csv(r'.\Data\Building_Locations.csv')
In [21]:
unh_buildings.head()
Out[21]:
building_names latitude longitude
0 Adams Tower 43.139059 -70.930673
1 Alexander 43.133578 -70.927602
2 Babcock 43.132371 -70.932693
3 Barton 43.141827 -70.938894
4 Christensen 43.131115 -70.934508
In [22]:
list_buildings = unh_buildings['building_names'].tolist()
In [23]:
#Dictionary with building as key and a tuple of latitude and longitude as value
building_coords = {row[1][0]:(row[1][1], row[1][2]) for row in unh_buildings.iterrows()}

2 Descriptive Statistics

<a id = '2'></a>

The next step to our analysis was to run some descriptive statisticst in order to get a better understanding of the data.

2.1 General Descriptive Statistics

<a id = '2.1'></a> After cleaning the data, we ended up with 1,020,992 connections over the course of the week.

In [24]:
len(wifi_df)
Out[24]:
1020992

Next we looked at what the top ten most frequently connected to access points on campus were. As shown below, we have the most commonly accessed access points on campus. The list below shows the top ten access points in order of most connections.

In [25]:
most_connected_ap = sorted(list(Counter(wifi_df['Access_Point'].tolist()).items()), key= lambda x:x[1], reverse=True)[:10]
for ap in most_connected_ap:
    print('Access Point: {} Building: {} Connections: {}'.format(ap[0], building_dict[ap[0]], ap[1]))
Access Point: MKAP33 Building: Murkland Connections: 19133
Access Point: HCAP29 Building: Holloway Commons Connections: 14043
Access Point: ETAP28 Building: Engelhardt Connections: 10849
Access Point: PKAP04 Building: Philbrook Connections: 10831
Access Point: CTAP01 Building: Craft Connections: 7815
Access Point: PKAP05 Building: Philbrook Connections: 7016
Access Point: HCAP12 Building: Holloway Commons Connections: 6149
Access Point: HSAP03 Building: Smith Connections: 5989
Access Point: GSAP39 Building: Gibbs Connections: 5702
Access Point: NHAP22 Building: New Hampshire Connections: 5683
In [26]:
def plot_top_paths(building, days, hour_range, percent=False):
    """
    Plots bar graph top 10 destinations from specified buildings at the specified time/day.
    
    Keyword Arguments:
    building -- name of building (string)
    days -- list of days interested in
    hour_range -- tuple of start and end hours
    percent -- Displays percentage on y-axis if True, raw count if false (boolean)
    """
    hours = list(range(hour_range[0],hour_range[1]+1))
    my_data = wifi_df[(wifi_df.Weekday.isin(days)) & (wifi_df.Hours.isin(hours))]
    paths = {b:defaultdict(int) for b in list_buildings}
    
    for b in list_buildings:
        for c in list_buildings:
            if c != b:
                paths[b][c] = 0
                
    mac_add = {x:'0' for x in my_data['MAC_Address'].tolist()}
    mac_list = my_data['MAC_Address'].tolist()
    build_list = my_data['Building'].tolist()
    
    for i in range(len(mac_list)):
        if mac_add[mac_list[i]] == 0:
            mac_add[mac_list[i]] = build_list[i]
        else:
            try:
                paths[mac_add[mac_list[i]]][build_list[i]] += 1
            except:
                pass
            mac_add[mac_list[i]] = build_list[i]
            
    total = sum([x[1] for x in paths[building].items()])
    top_paths = sorted(list(paths[building].items()), key= lambda x:x[1], reverse=True)
    plot_x, plot_y = [], []
    for i, path in enumerate(top_paths):
        if percent==True:
            plot_y.append(round((path[1]/total),2)*100)
        else:
            plot_y.append(path[1])
        plot_x.append(path[0])
        if i == 10:
            break
    ax = plt.bar(range(len(plot_x)), plot_y, align='center')
    plt.xticks(range(len(plot_x)), plot_x, rotation=90)
    plt.xlim([0,len(plot_x)])
    plt.xlabel('Buildings Travelled to')
    if percent==True:
        plt.ylabel('Percent of Connections')
        
    else:
        plt.ylabel('Number of Connections')
    plt.title('Top Paths From '+ str(building)+ ' On '+', '.join([(weekday_dict[b]) for b in days])+' For Hours '+
              str(hour_range[0]) +' to '+str(hour_range[1]))
    return ax

2.2 Top Paths

<a id = '2.2'></a> After performing some descriptive statistics, we started to visualize paths and devise a way to see what buildings students were traveling to from a given building.

Plotting the top paths from a building can be very informative for understanding how students move throughout campus. The histogram below is from a function we created and has the following parameters:

  • Building: 'Kingsbury'
  • Day: 5 (Satuday)
  • Time Range: 0 to 12 (1 AM to 1 PM)

This histogram shows the next building that students who were in Kingsbury Hall on Saturday from 1 AM to 1 PM were travelling to.

In [27]:
plot_top_paths('Kingsbury', [5], (0,12), percent=True)
Out[27]:
<Container object of 11 artists>
In [28]:
def plot_connections(building, day, color='Black'):
    """
    Plots scatter plot of unique building connections over specified day.
    
    Keyword arguments:
    building -- name of building (string)
    day -- day of the week (int)
    color -- color of points on graph, default Black
    """
    df = wifi_df[(wifi_df['Building'].str.contains(building)==True) & (wifi_df['Weekday']==day)]
    c = Counter(df['Hours'].tolist())
    ax = plt.scatter(range(len(c)), list(c.values()), color=color)
    plt.xlim([0, 23])
    plt.xlabel('Hour of Day')
    plt.ylabel('Number of Unique Connections')
    plt.title('Connections For '+ str(building))
    return ax

2.3 Hourly Connections

<a id = '2.3'></a> We also created an easy function to show the amount of unique connections within a building over the course of the day. Because this function returns a matplotlib.scatter we can easily call the function multiple times and stack their results for an easy comparison between buildings or days.

Below is two scatterplots created with this function, the first one's parameters are:

  • Building: 'Kingsbury'
  • Day: 0 (Monday)
  • Color: 'Blue'

The second one's parameters are:

  • Building: 'Kingsbury'
  • Day: 4 (Friday)
  • Color: 'Red'

With this simple function we are able gather tons of information and we can answer simple questions such as "What weekday is the busiest for a building?" or "What are the peak hours for the dining halls?"

In [29]:
d = plot_connections('Kingsbury', 0, 'Blue')
e = plot_connections('Kingsbury', 4, 'Red')
plt.legend([d, e], ['Monday', 'Friday'])
plt.show()
In [30]:
def get_plot_paths(day, hour):
    """
    Returns nested dictionary of all building paths and # of people who took path for specified day and hour.
    
    Keyword arguments:
    day -- day of the week (list(int))
    hour -- hours of day (list(int))  
    """
    my_paths = wifi_df[(wifi_df.Weekday.isin(day)) & (wifi_df.Hours.isin(hour))]
    paths = {b:defaultdict(int) for b in list_buildings}
    
    for b in list_buildings:
        for c in list_buildings:
            if c != b:
                paths[b][c] = 0
    mac_add = {x:'0' for x in wifi_df['MAC_Address'].tolist()}
    mac_list = my_paths['MAC_Address'].tolist()
    build_list = my_paths['Building'].tolist()
    
    for i in range(len(mac_list)):
        if mac_add[mac_list[i]] == 0:
            mac_add[mac_list[i]] = build_list[i]
        else:
            try:
                paths[mac_add[mac_list[i]]][build_list[i]] += 1
            except:
                pass
            mac_add[mac_list[i]] = build_list[i]
    return paths

# To understand what the above function is returning uncomment out the line below
#print(get_plot_paths([5], [1, 2, 3, 4, 5, 6, 7, 8]))
In [31]:
my_colors = 'blue red green yellow purple black orange white teal crimson cyan brown gray hotpink lavendar'.split()

def return_all_connections(days, hours):
    """
    Returns dictionary with buildings as keys and # of unique connections in building at specified time/day as values
    
    Keyword arguments:
    days -- days of the week (list(int))
    hours -- hours of each day (list(int))
    """
    my_data = wifi_df[(wifi_df.Weekday.isin(days)) & (wifi_df.Hours.isin(hours))]
    building_count = Counter(my_data['Building'].tolist())
    return dict(building_count.items())

def plot_path_lines(days, hour_range, building, heat_map=False):
    
    plt.figure(figsize=(12,16))
    hours = list(range(hour_range[0],hour_range[1]+1))
    path_dict = get_plot_paths(days, hours)
    color = 'blue red green yellow purple black orange white teal'.split()
    if len(building)>8:
        print("Too many buildings selected! Can only plot 8")
        return
    
    m=Basemap(projection='merc',
              llcrnrlon=-70.931135416, 
              llcrnrlat=43.1341063997, 
              urcrnrlon=-70.9164369106, 
              urcrnrlat=43.149489287,
              resolution='l', 
              epsg=4236)
    
    m.drawmapboundary(fill_color='#F5F5F5', linewidth=0)
    m.arcgisimage(service='World_Street_Map', xpixels=1000, verbose= False)
    
    #Offsets necessary to plot Google Maps coordinates on our ArcGIS map
    x_offset = 0.0095
    y_offset = 0.004
    
    total=0
    dot_scale = 0
    if heat_map==True:
        heat_dict = return_all_connections(days, hours)
        total = sum([val[1] for val in list(heat_dict.items())])
        dot_scale = 40/(max([val[1] for val in list(heat_dict.items())])/total)
    
    for i in range(len(unh_buildings)):
        x, y = m(unh_buildings.ix[i,2]+x_offset, unh_buildings.ix[i,1]+y_offset)
        if heat_map==True:
            try:
                m.plot(x, y, 'o', markersize=dot_scale*(heat_dict[unh_buildings.ix[i,0]]/total), color='#444444', alpha=0.6)
            except:
                pass
        else:
            m.plot(x, y, 'o', markersize=5, color='#444444', alpha=0.6)
    
    if heat_map==False:
        for i, b in enumerate(building):
            x, y = m(building_coords[b][1]+x_offset, building_coords[b][0]+y_offset)
            m.plot(x, y, 'o', markersize=5, color=my_colors[i], alpha=0.8)

        total = 0
        for b in building:
            for key in unh_buildings['building_names'].tolist():
                total+=path_dict[b][key]
        
        building_max = 0
        for c in building:
            new_max = max([b[1] for b in list(path_dict[c].items())])
            if building_max < new_max:
                building_max = new_max
        alpha_scale = 1/building_max
        
        for i, b in enumerate(building):
            for key in path_dict[b]:
                line_width = (path_dict[b][key]/total)*50*len(building)
                #alpha = (path_dict[b][key]/total)*10*len(building)
                try:
                    lonlist = [building_coords[key][1]+x_offset, building_coords[b][1]+x_offset]
                    latlist = [building_coords[key][0]+y_offset, building_coords[b][0]+y_offset]
                    x, y = m(lonlist,latlist)
                    m.plot( x, y, color=my_colors[i], lw=line_width, alpha=alpha_scale*path_dict[b][key])
                except:
                    pass
        plt.title('Connections From '+str(', '.join(building)) + ' For Days '+ ', '.join([(weekday_dict[day]) for day in days]) \
                  + ' For Time Range ' + str(hour_range[0]) + ' to ' + str(hour_range[1]))

        patches = []
        for i in range(len(building)):
            patches.append(mpatches.Patch(color=my_colors[i], label=building[i]))
        plt.legend(handles=patches)
    else:
        
        plt.title('Heat Map of Connections For Days '+ ', '.join([(weekday_dict[day]) for day in days])+ ' For Time Range ' \
                  + str(hour_range[0]) + ' to ' + str(hour_range[1]))
        print('Biggest dot is '+ str(max([val[1] for val in list(heat_dict.items())]))+ ' Unique Connections')
        

2.4 Geographic Plots

<a id = '2.4'></a> From the onset of this project, we realized that much of this data would be most interesting to see in the form of a plotted visualization, so we queried ArcGIS's database to get a detailed map of the UNH campus and overlayed the building points on top of it. From there we were able to create some very interesting visualizations that helped us to visualize what was going on around campus.

To get a detailed map of the UNH campus we utilized ArcGIS server to download a high-res photo of campus, and then we overlayed our points on top. This was definitely one of the more challenging parts of the project, especially because the coordinates we gathered for every building happened to be slightly off from the ArcGIS image. This is due to the way ArcGIS and Google use different map projections that will differ on a micro level. You can read more about this issue here. To correct for this issue we added a x and y offset to every point.

2.4.1 Campus Heat Map

<a id = '2.4.1'></a> For our first vizualization, we created a heatmap of the campus where each building is plotted as a dot. The size of each dot varies based on the number of unique connections in the building. In this particular plot, the 'Holloway Commons' has the most unique connections between noon and 4 PM (2,918 total connections).

In [32]:
plot_path_lines([0], (12,16), [], heat_map=True)
Biggest dot is 2918 Unique Connections

2.4.2 Building Paths

<a id = '2.4.2'></a> Next, we plotted where students were travelling to within the same time range of noon to 4 PM. The plot below shows where students who were in 'Dimond Library', 'Kingsbury', and 'Stoke' connected next. The thickness and opacity of the lines shows the number of students, the thicker the line is, the greater number of students who have travelled that particular path.

In [33]:
plot_path_lines([5], (12,16), ['Dimond Library', 'Kingsbury', 'Stoke'], heat_map=False)

2.4.3 Iterative Building Paths

<a id = '2.4.3'></a> We found these results interesting, however, we wanted to track more than just where they went from one location, but more of where they went after departing a specific location. That's where we got the idea of plotting iterative building paths. When plotting the paths of students iteratively you're able to learn much more about their movements on campus.

2.4.3.1 Unfiltered Iterative Building Paths

<a id = '2.4.3.1'></a> The next group of plots track students originating from 'Philbrook Hall'. The first iteration shows where students travelled from 'Philbrook Hall', again with the thickness of the line indicating how many students follwoed that path. The second iteration shows where those students went from that hall, essentially showing their second stop. Finally, the third iteration, shows their third stop from their second stop. This is a very powerful map as it allows us to visualize how students travel around campus given where they start from.

In [34]:
def get_people_paths(day, hour, building):
    
    my_paths = wifi_df[(wifi_df.Weekday.isin(day)) & (wifi_df.Hours.isin(hour))]
    my_people_paths = my_paths[my_paths.Building == building]
    people_list = my_people_paths['MAC_Address'].tolist()
    my_paths2 = my_paths[my_paths.MAC_Address.isin(people_list)]
    full_people_list = my_paths2['MAC_Address'].tolist()
    connection_building = my_paths2['Building'].tolist()
    people_connections = {mac:0 for mac in people_list}
    
    for i in range(len(my_paths2)):
        if people_connections[full_people_list[i]] != 0:
            people_connections[full_people_list[i]].append(connection_building[i])
        elif building == connection_building[i]:
            people_connections[full_people_list[i]] = [building]
            
    return people_connections

def plot_iterative_path_lines(days, hour_range, building, iterations, start_num='all', max_buildings=15, path_num=15):
    

    build_num = start_num
    building_label, last_buildings = [], [building]
    hours = list(range(hour_range[0],hour_range[1]+1)) #Converting tuple range to list of hours
    people_paths = get_people_paths(days, hours, building) #Getting the people's paths at the specified parameters
    mac_list = people_paths.keys()
    
    all_paths = []
    for b in unh_buildings.ix[:,0]:
        for c in unh_buildings.ix[:,0]:
            all_paths.append([b, c])
    
    for it in range(1, iterations+1):

        plt.figure(figsize=(12,16))
        
        if build_num!='all':
            path_dict = {(a[0],a[1]):0 for a in all_paths}
        else:
            path_dict = {b:{c:0 for c in unh_buildings.ix[:,0]} for b in unh_buildings.ix[:,0]}

        m=Basemap(projection='merc',
                  llcrnrlon=-70.931135416, 
                  llcrnrlat=43.1341063997, 
                  urcrnrlon=-70.9164369106, 
                  urcrnrlat=43.149489287, 
                  resolution='l', 
                  epsg=4236)
        
        m.drawmapboundary(fill_color='#F5F5F5', linewidth=0)
        m.arcgisimage(service='World_Street_Map', xpixels=1000, verbose= False)

        #Offsets necessary to plot Google Maps coordinates on our ArcGIS map
        x_offset = 0.0095
        y_offset = 0.004

        for i in range(len(unh_buildings)):
            x, y = m(unh_buildings.ix[i,2]+x_offset, unh_buildings.ix[i,1]+y_offset)
            m.plot(x, y, 'o', markersize=5, color='#444444', alpha=0.6)

        total = 0
        for mac in mac_list:
            if len(people_paths[mac])>it:
                try:
                    if build_num!='all':
                        path_dict[(people_paths[mac][it-1], people_paths[mac][it])]+=1
                        total+=1
                    else:
                        path_dict[people_paths[mac][it-1]][people_paths[mac][it]]+=1
                        total+=1
                except:
                    pass
        scale = 50
        patches = []
        label_dict, color_dict = {}, {}
        
        if build_num !='all':
            path_list = []
            for b in unh_buildings.ix[:,0]:
                for c in unh_buildings.ix[:,0]:
                    try:
                        if path_dict[(b,c)] > 0:
                             if b in last_buildings:
                                path_list.append((b, c, path_dict[(b,c)]))
                    except:
                        print('Error here?: ',path_dict[(b,c)])
            top_paths = sorted(path_list, key=lambda x:x[2], reverse=True)
            if it==1:
                top_paths = top_paths[:int(build_num)]
            else:
                top_paths = top_paths[:path_num]
                
            top_buildings = [x[0] for x in top_paths]
            count = 0
            for b in top_buildings:
                if b not in color_dict:
                    color_dict[b] = my_colors[count]
                    count+=1
            total = sum([x[2] for x in top_paths])
            scale = int(build_num)*3
            new_path_dict = {(b[0],b[1]):b[2]  for b in top_paths}
            last_buildings = [x[1] for i, x in enumerate(top_paths) if i<max_buildings]
            
            for i, key in enumerate(new_path_dict.keys()):
                line_width = (new_path_dict[key]/total)*scale
                try:
                    lonlist = [building_coords[key[0]][1]+x_offset, building_coords[key[1]][1]+x_offset]
                    latlist = [building_coords[key[0]][0]+y_offset, building_coords[key[1]][0]+y_offset]
                    x, y = m(lonlist, latlist)
                    #alpha = (new_path_dict[key]/total)*(build_num/2)
                    m.plot( x, y, color=color_dict[key[0]], lw=line_width, alpha=.8)#alpha)
                    label = 'Path From '+str(key[0])
                    if label in label_dict:
                        pass
                    else:
                        label_dict[label] = color_dict[key[0]]
                        patches.append(mpatches.Patch(color=color_dict[key[0]], label=label))
                except:
                    pass
            plt.legend(handles=patches)
            
        else:
            new_path_dict = {x:{y:z for y,z in path_dict[x].items() if y and z!=0 and z} for x,y in path_dict.items() if x}
            for i, b in enumerate(new_path_dict.keys()):
                for key in new_path_dict[b]:
                    line_width = (new_path_dict[b][key]/total)*scale#*it #*len(path_dict.keys())
                    lonlist = [building_coords[key][1]+x_offset, building_coords[b][1]+x_offset]
                    latlist = [building_coords[key][0]+y_offset, building_coords[b][0]+y_offset]
                    x, y = m(lonlist, latlist)
                    m.plot( x, y, color='Black', lw=line_width, alpha=.8)
        plt.title('Originating From: '+str(building) +' Iteration: '+str(it)+' With '+str(total)+' Connections')
In [35]:
plot_iterative_path_lines([5], (12,16), 'Philbrook', 3, start_num='all')

2.4.3.2 Filtered Iterative Building Paths

<a id = '2.4.3.2'></a> Due to the chaos-ness of this last visualization we decided to edit the function to accept a few additional arguments that will help narrow the scope of what we're looking at, and color code the lines accordingly. This will help people to draw insight from the plot faster.

The arguments we added were the number of starting lines from the building, the maximum number of buildings to show the paths for, and an argument that will plot how many lines you want after the first iteration.

The following plot below has the following arguments:

  • Starting Building: Philbrook
  • Day: Friday
  • Time: 12:00 PM to 4:00 PM
  • Iterations: 4
  • Starting Paths: 10
  • Max Buildings: 8
  • Number of Paths After First Iteration: 30
In [38]:
plot_iterative_path_lines([5], (12,16), 'Philbrook', 4, start_num=10, max_buildings=8, path_num=30)

With these additional arguments we are able to easily trace the most common paths from any building and follow where they're going next for additional iterations.

This series of iterations does a great job of illustrating a Friday in the life of a student who is likely eating breakfast/lunch at 'Philbrook', which is a dining hall. The first iteration seems to show students leaving Philbrook, travelling to various academic buildings (Kingsbury, Parsons, etc..), and dormitories (Christensen, Williamson, Haaland, etc..).

The second iteration is then showing that many people who went to these dormitories/ academic buildings actually end up going back to Philbrook, or at least come close enough to connect ot their WiFi.

The third iteration again shows that most people are connected to Philbrook and they're now overwhelmingly going back to their dorms.

The fourth iteration then shows that once again many people are connecting to Philbrook.

This is a very interesting plot because there could be a lot going on underneath the surface that one might not be aware of if they're not familiar with UNH campus. Rather than this original group of people eat 3 times within 4 hours they're most likely connecting to the Philbrook dining hall when walking by it, because the path from Christensen and Williamson to campus happens to touch Philbrook.

Another possibility for this phenomenon could be that the access points aren't on the same time so when sorting our entries by time they won't be in the proper order. This is something we would like to investigate further, but we haven't got the time at the moment.

3 Interactive Descriptive Plots

<a id = '3'></a> In order for these plots to be accessible to everyone for gaining insight we decided to make them interactable.

3.1 Interactive Top Path Comparison

<a id = '3.1'></a> From this interactive interface one is able to quickly compare where most people are going from two buildings at a specified day and time.

On the plot 1 tab, select the building you're interested in from the dropdown. Click the small boxes next to the day's of the week you're interested in, and then drag the time slider to reflect a time range you want to look at. Repeat these steps for plot 2 on the plot 2 tab and select at the button checkbox if you want the results in percentages or a raw count.

The following example below is comparing Philbrook and Stilings on Monday and Wednesday between the hours of 7 and 10. From this chart we are not only able to see that Philbrook has more morning traffic on these days, but we're also able to see that most people's next stop is close to the dining hall they're eating at.

In [39]:
def plot_both_top_paths(b):
    
    clear_output(wait=True)
    weekday_list1 = [i for i in range(7) if top_path_hbox2.children[i+1].value==True]
    weekday_list2 = [i for i in range(7) if top_path_hbox6.children[i+1].value==True]
    plt.figure(figsize=(16,12))
    plt.subplot(2, 2, 1)
    plot_top_paths(top_path_building_dropdown1.value, weekday_list1, top_path_time_slider1.value,
                   percent=top_path_percent_checkbox.value)
    plt.subplot(2, 2, 2)
    plot_top_paths(top_path_building_dropdown2.value, weekday_list2, top_path_time_slider2.value,
                   percent=top_path_percent_checkbox.value)
    plt.show()


top_path_building_text1 = widgets.Latex(value='Select Building One:', width='20%')
top_path_building_dropdown1 = widgets.Dropdown(options = list_buildings, height='25px')
top_path_hbox1 = widgets.HBox(children=[top_path_building_text1, top_path_building_dropdown1], width='100%', height='50px')

top_path_weekday_text1 = widgets.Latex(value='Day of Week:', width='10%')
top_path_monday_checkbox1 = widgets.Checkbox(description = 'Monday:   ', value=False, width='10%')
top_path_tuesday_checkbox1 = widgets.Checkbox(description = 'Tuesday:  ', value=False, width='10%')
top_path_wednesday_checkbox1 = widgets.Checkbox(description = 'Wednesday:', value=False, width='10%')
top_path_thursday_checkbox1 = widgets.Checkbox(description = 'Thursday: ', value=False, width='10%')
top_path_friday_checkbox1 = widgets.Checkbox(description = 'Friday:   ', value=False, width='10%')
top_path_saturday_checkbox1 = widgets.Checkbox(description = 'Saturday: ', value=False, width='10%')
top_path_sunday_checkbox1 = widgets.Checkbox(description = 'Sunday:   ', value=False, width='10%')
top_path_hbox2 = widgets.HBox(height='40px',width='100%')
top_path_hbox2.children = [top_path_weekday_text1, top_path_monday_checkbox1, top_path_tuesday_checkbox1, top_path_wednesday_checkbox1,
                        top_path_thursday_checkbox1, top_path_friday_checkbox1, top_path_saturday_checkbox1, top_path_sunday_checkbox1]

top_path_hour_text1 = widgets.Latex(value='Enter Time Range:', width='20%')
top_path_time_slider1 = widgets.IntRangeSlider(min=0,max=23,step=1,value=(0,23), width='80%')
top_path_hbox3 = widgets.HBox(height='40px',width='100%')
top_path_hbox3.children = [top_path_hour_text1, top_path_time_slider1]

top_path_submit = widgets.Button(description='Plot Top Paths!')
top_path_submit.on_click(plot_both_top_paths)
top_path_percent_checkbox = widgets.Checkbox(description='Display in percent: ', value=False, width=40)
top_path_hbox4 = widgets.HBox(height='40px',width='100%')
top_path_hbox4.children = [top_path_submit, top_path_percent_checkbox]

top_path_building_text2 = widgets.Latex(value='Select Building One:', width='20%')
top_path_building_dropdown2 = widgets.Dropdown(options = list_buildings, height='25px')
top_path_hbox5 = widgets.HBox(children=[top_path_building_text2, top_path_building_dropdown2], width='100%', height='50px')

top_path_weekday_text2 = widgets.Latex(value='Day of Week:', width='10%')
top_path_monday_checkbox2 = widgets.Checkbox(description = 'Monday:   ', value=False, width='10%')
top_path_tuesday_checkbox2 = widgets.Checkbox(description = 'Tuesday:  ', value=False, width='10%')
top_path_wednesday_checkbox2 = widgets.Checkbox(description = 'Wednesday:', value=False, width='10%')
top_path_thursday_checkbox2 = widgets.Checkbox(description = 'Thursday: ', value=False, width='10%')
top_path_friday_checkbox2 = widgets.Checkbox(description = 'Friday:   ', value=False, width='10%')
top_path_saturday_checkbox2 = widgets.Checkbox(description = 'Saturday: ', value=False, width='10%')
top_path_sunday_checkbox2 = widgets.Checkbox(description = 'Sunday:   ', value=False, width='10%')
top_path_hbox6 = widgets.HBox(height='40px',width='100%')
top_path_hbox6.children = [top_path_weekday_text2, top_path_monday_checkbox2, top_path_tuesday_checkbox2, top_path_wednesday_checkbox2,
                        top_path_thursday_checkbox2, top_path_friday_checkbox2, top_path_saturday_checkbox2, top_path_sunday_checkbox2]

top_path_hour_text2 = widgets.Latex(value='Enter Time Range:', width='20%')
top_path_time_slider2 = widgets.IntRangeSlider(min=0,max=23,step=1,value=(0,23), width='80%')
top_path_hbox7 = widgets.HBox(height='40px',width='100%')
top_path_hbox7.children = [top_path_hour_text2, top_path_time_slider2]

top_path_submit = widgets.Button(description='Plot Top Paths!')
top_path_submit.on_click(plot_both_top_paths)
top_path_percent_checkbox = widgets.Checkbox(description='Display in percent: ', value=False, width=40)
top_path_hbox4 = widgets.HBox(height='40px',width='100%')
top_path_hbox4.children = [top_path_submit, top_path_percent_checkbox]


top_path_tab1 = widgets.VBox(children=[top_path_hbox1, top_path_hbox2, top_path_hbox3])
top_path_tab2 = widgets.VBox(children=[top_path_hbox5, top_path_hbox6, top_path_hbox7])

top_path_tab = widgets.Tab(children=[top_path_tab1, top_path_tab2])
top_path_tab.set_title(0, 'Plot1')
top_path_tab.set_title(1, 'Plot2')
display(top_path_tab)

display(top_path_hbox4)

3.2 Interactive Hourly Connection Scatterplot

<a id = '3.2'></a> From this interactive interface one is able to compare the hourly connections for up to three different buildings on various days.

On each plot tab just select the values you want to plot. Select the building from the dropdown bar, the day of the week from the slider, and the color from the dropdown bar at the bottom. Then select which plots you want to include and hit the "Display Plots!" button.

The example below is the hourly connections for Gables A, Gables B, and Gables C for saturday. As you can see from this chart the number of connections for Gables C is significantly lower than Gables A, and Gables B, especially at peak party hours of 12:00 AM to 2:00 AM showing that it might be more beneficial for RA's to focus more on those two towers.

There are many applications to these charts because they essentially show student activity in any building.

In [40]:
def plot_connection_scatter(b):
    
    clear_output(wait=True)
    connection_scatter_plots, connection_scatter_labels = [], []
    if connection_scatter_plot1_check.value==True:
        connection_scatter_my_plot1 = plot_connections(connection_scatter_building_dropdown1.value, 
                                                connection_scatter_day_slider1.value, connection_scatter_color_dropdown1.value)
        connection_scatter_plots.append(connection_scatter_my_plot1)
        connection_scatter_label1 = str(connection_scatter_building_dropdown1.value)+ ' For '\
        + weekday_dict[connection_scatter_day_slider1.value]
        connection_scatter_labels.append(connection_scatter_label1)
        
    if connection_scatter_plot2_check.value==True:
        connection_scatter_my_plot2 = plot_connections(connection_scatter_building_dropdown2.value, 
                                                connection_scatter_day_slider2.value, connection_scatter_color_dropdown2.value)
        connection_scatter_plots.append(connection_scatter_my_plot2)
        connection_scatter_label2 = str(connection_scatter_building_dropdown2.value)+ ' For ' \
        + weekday_dict[connection_scatter_day_slider2.value]
        connection_scatter_labels.append(connection_scatter_label2)
        
    if connection_scatter_plot3_check.value==True:
        connection_scatter_my_plot3 = plot_connections(connection_scatter_building_dropdown3.value, 
                                                connection_scatter_day_slider3.value, connection_scatter_color_dropdown3.value)
        connection_scatter_plots.append(connection_scatter_my_plot3)
        connection_scatter_label3 = str(connection_scatter_building_dropdown3.value)+ ' For ' \
        + weekday_dict[connection_scatter_day_slider3.value]
        connection_scatter_labels.append(connection_scatter_label3)
    
    plt.legend(connection_scatter_plots, connection_scatter_labels)

connection_scatter_text1 = widgets.Latex(value='Select Building:', width='10%')
connection_scatter_building_dropdown1 = widgets.Dropdown(options = list_buildings, height='25px')
connection_scatter_hbox1 = widgets.HBox(children=[connection_scatter_text1, connection_scatter_building_dropdown1],
                                        width='100%', height='50px')

connection_scatter_text2 = widgets.Latex(value='Select Building:', width='10%')
connection_scatter_building_dropdown2 = widgets.Dropdown(options = list_buildings, height='25px')
connection_scatter_hbox2 = widgets.HBox(children=[connection_scatter_text2, connection_scatter_building_dropdown2],
                                        width='100%', height='50px')

connection_scatter_text3 = widgets.Latex(value='Select Building:', width='10%')
connection_scatter_building_dropdown3 = widgets.Dropdown(options = list_buildings, height='25px')
connection_scatter_hbox3 = widgets.HBox(children=[connection_scatter_text3, connection_scatter_building_dropdown3],
                                        width='100%', height='50px')
#--------------------------------------------------------------------------------------
connection_scatter_text4 = widgets.Latex(value='Select Day:', width='10%')
connection_scatter_day_slider1 = widgets.IntSlider(min=0, max=6, step=1)
connection_scatter_hbox4 = widgets.HBox(children=[connection_scatter_text4, connection_scatter_day_slider1],
                                        width='100%', height='50px')

connection_scatter_text5 = widgets.Latex(value='Select Day:', width='10%')
connection_scatter_day_slider2 = widgets.IntSlider(min=0, max=6, step=1)
connection_scatter_hbox5 = widgets.HBox(children=[connection_scatter_text5, connection_scatter_day_slider2],
                                        width='100%', height='50px')

connection_scatter_text6 = widgets.Latex(value='Select Day:', width='10%')
connection_scatter_day_slider3 = widgets.IntSlider(min=0, max=6, step=1)
connection_scatter_hbox6 = widgets.HBox(children=[connection_scatter_text6, connection_scatter_day_slider3],
                                        width='100%', height='50px')
#--------------------------------------------------------------------------------------
connection_scatter_text7 = widgets.Latex(value='Select Color:', width='10%')
connection_scatter_color_dropdown1 = widgets.Dropdown(options = my_colors, height='25px')
connection_scatter_hbox7 = widgets.HBox(children=[connection_scatter_text7, connection_scatter_color_dropdown1],
                                        width='100%', height='50px')

connection_scatter_text8 = widgets.Latex(value='Select Color:', width='10%')
connection_scatter_color_dropdown2 = widgets.Dropdown(options = my_colors, height='25px')
connection_scatter_hbox8 = widgets.HBox(children=[connection_scatter_text8, connection_scatter_color_dropdown2],
                                        width='100%', height='50px')

connection_scatter_text9 = widgets.Latex(value='Select Color:', width='10%')
connection_scatter_color_dropdown3 = widgets.Dropdown(options = my_colors, height='25px')
connection_scatter_hbox9 = widgets.HBox(children=[connection_scatter_text9, connection_scatter_color_dropdown3],
                                        width='100%', height='50px')


connection_scatter_tab1 = widgets.VBox(children=[connection_scatter_hbox1, connection_scatter_hbox4, connection_scatter_hbox7])
connection_scatter_tab2 = widgets.VBox(children=[connection_scatter_hbox2, connection_scatter_hbox5, connection_scatter_hbox8])
connection_scatter_tab3 = widgets.VBox(children=[connection_scatter_hbox3, connection_scatter_hbox6, connection_scatter_hbox9])

connection_scatter_tab = widgets.Tab(children=[connection_scatter_tab1, connection_scatter_tab2, connection_scatter_tab3])
connection_scatter_tab.set_title(0, 'Plot1')
connection_scatter_tab.set_title(1, 'Plot2')
connection_scatter_tab.set_title(2, 'Plot3')
display(connection_scatter_tab)

connection_scatter_button = widgets.Button(description='Display Plots!', width='10%', padding=10)
connection_scatter_button.on_click(plot_connection_scatter)

connection_scatter_plot1_check = widgets.Checkbox(description = 'Include Plot1', value=False, width=40)
connection_scatter_plot2_check = widgets.Checkbox(description = 'Include Plot2', value=False, width=40)
connection_scatter_plot3_check = widgets.Checkbox(description = 'Include Plot3', value=False, width=40)

connection_scatter_hbox10 = widgets.HBox(children=[connection_scatter_button, connection_scatter_plot1_check, 
                                                   connection_scatter_plot2_check, connection_scatter_plot3_check],
                                         width='100%', height='50px')
display(connection_scatter_hbox10)

3.3 Interactive Geographic Plotting

<a id = '3.3'></a> From this interactive interface one is able to attain a great deal of information about the UNH campus through utilizing geographic plots of the campus.

This interface consists of three tabs that each provide a different function.

3.3.1 Path Plot Tab

<a id = '3.3.1'></a> The first tab is the Path Plot tab. On this tab you type in the buildings you're interested in seperated by a comma and a space, the days of the week you're interested in, and select a time range with the slider. After you do that you hit the "Plot Paths!" button and a plot showing where people are going from your buildings is generated. From this plot one can visualize where people on campus are going at a certain date and time.

3.3.2 Heat Map Tab

<a id = '3.3.2'></a> The second tab is the Heat Map tab. On this tab you just select the days of the week and a time range you're interested in, and then hit the "Plot Heat Map!" button. A heat map for the number of connections in every building is then generated. From this plot one could see what buildings have the most traffic at a certain time and date.

3.3.3 Iterative Path Plot Tab

<a id = '3.3.3'></a> The last tab is the Iterative Path Plot tab. On this tab there are many options to select from. On the first row of options there are three dropdown boxes. The first dropdown box is the building you're interested in, the second dropdown box is the number of iterations you want, and the third dropdown box is for the number of starting paths you're interested in, this is used to make sure not too many people are being tracked in future iterations, but if you still want every path tracked there is a dropdown option 'all' at the bottom. The next two rows is for selecting the day's of the week and the time range you're interested in. The last row of options contains a dropdown box for the maximum number of buildings you're interested in seeing, a text box where you enter how many paths you want to see after the first iteration, and a button "Plot Iterative Paths" that generates the multiple paths. This function takes awhile to run because it displays a seperate plot for each iteration, also note that if 'all' is selected from the dropdown menu the max number of buildings and number of path options in the last row are ignored. From this plot you are able to track people's movements for multiple iterations.

In [41]:
def plot_path_map(b):
    
    clear_output(wait=True)
    btext_list = (text.value).split(', ')
    weekday_list = [i for i in range(7) if hbox2.children[i+1].value==True]
    time_range = start_time_slider.value
    plot_path_lines(weekday_list, time_range,
                    btext_list)
    
def plot_heat_map(b):
    
    clear_output(wait=True)
    btext_list = (text.value).split(', ')
    weekday_list = [i for i in range(7) if hbox2.children[i+1].value==True]
    time_range = start_time_slider.value
    plot_path_lines(weekday_list, time_range,
                    btext_list, heat_map=True)
    
def plot_iter_paths(b):
    
    clear_output(wait=True)
    weekday_list = [i for i in range(7) if hbox2.children[i+1].value==True]
    time_range = start_time_slider.value
    plot_iterative_path_lines(weekday_list, time_range,
                    dropdown_iter_building.value, int(dropdown_iter_number.value), start_num=dropdown_iter_building_number.value,
                             max_buildings=int(dropdown_iter_max_buildings.value), path_num=int(iter_text.value))

text_label = widgets.Latex(value='Enter Buildings to plot:', width='20%')
text = widgets.Text(description='', width='70%')
hbox1 = widgets.HBox(height='40px',width='100%')
hbox1.children = [text_label, text]

text_label2 = widgets.Latex(value='Day of Week:', width='10%')
monday_checkbox = widgets.Checkbox(description = 'Monday:   ', value=False, width='10%')
tuesday_checkbox = widgets.Checkbox(description = 'Tuesday:  ', value=False, width='10%')
wednesday_checkbox = widgets.Checkbox(description = 'Wednesday:', value=False, width='10%')
thursday_checkbox = widgets.Checkbox(description = 'Thursday: ', value=False, width='10%')
friday_checkbox = widgets.Checkbox(description = 'Friday:   ', value=False, width='10%')
saturday_checkbox = widgets.Checkbox(description = 'Saturday: ', value=False, width='10%')
sunday_checkbox = widgets.Checkbox(description = 'Sunday:   ', value=False, width='10%')
hbox2 = widgets.HBox(height='40px',width='100%')
hbox2.children = [text_label2, monday_checkbox, tuesday_checkbox, wednesday_checkbox, thursday_checkbox, 
                  friday_checkbox, saturday_checkbox, sunday_checkbox]

text_label3 = widgets.Latex(value='Enter Time Range:', width='10%')
start_time_slider = widgets.IntRangeSlider(min=0,max=23,step=1,value=(0,23), width='80%')
hbox3 = widgets.HBox(height='40px',width='100%')
hbox3.children = [text_label3, start_time_slider]

path_plot_button = widgets.Button(description='Plot Paths!', width='10%')
submit_path_hbox = widgets.HBox(height='40px',width='100%')
submit_path_hbox.children = [path_plot_button]
path_plot_button.on_click(plot_path_map)

heat_button = widgets.Button(description='Plot Heat Map!')
submit_heat_hbox = widgets.HBox(height='40px',width='100%')
submit_heat_hbox.children = [heat_button]
heat_button.on_click(plot_heat_map)

building_range = '5 6 7 8 9 10 11 12 13 14 15 all'.split()
iter_building_latex = widgets.Latex(value='Select Building:', width='10%')
iter_number_latex = widgets.Latex(value='Select # Iterations:', width='15%')
iter_building_number = widgets.Latex(value='Select # Start Paths:', width='10%')
dropdown_iter_building_number = widgets.Dropdown(options=building_range, width='15%')
dropdown_iter_building = widgets.Dropdown(options=list_buildings, width='15%')
dropdown_iter_number = widgets.Dropdown(options=['2', '3', '4'])
iter_hbox = widgets.HBox(height='40px',width='100%')
iter_hbox.children = [iter_building_latex, dropdown_iter_building, iter_number_latex, dropdown_iter_number,
                      iter_building_number, dropdown_iter_building_number]

iter_build_num_latex = widgets.Latex(value='Select Max # Buildings:', width='15%')
iter_path_num = widgets.Latex(value='Enter # Paths:', width='10%')
dropdown_iter_max_buildings = widgets.Dropdown(options=building_range)
iter_text = widgets.Text(description='', width='20%')

iter_button = widgets.Button(description='Plot Iterative Paths!')
submit_iter_hbox = widgets.HBox(height='40px',width='100%')
submit_iter_hbox.children = [iter_build_num_latex, dropdown_iter_max_buildings, iter_path_num, iter_text, iter_button]
iter_button.on_click(plot_iter_paths)

path_tab = widgets.VBox(children=[hbox1, hbox2, hbox3, submit_path_hbox])
heat_tab = widgets.VBox(children=[hbox2, hbox3, submit_heat_hbox])
iterative_tab = widgets.VBox(children=[iter_hbox, hbox2, hbox3, submit_iter_hbox])

tab = widgets.Tab(children=[path_tab, heat_tab, iterative_tab])
tab.set_title(0, 'Path Plot')
tab.set_title(1, 'Heat Map')
tab.set_title(2, 'Iterative Path Plot')
display(tab)

4 Results

<a id = '4'></a> Our results clearly showed that tracking WiFi connections on a university campus provides enough information to get a great general sense of how students are traveling around campus and where they are at given points throughout the day. We believe this data can be useful for university administrators, the public safety office and the university information technology department. This research and initial look into the student movement patterns will create a platform for which these stakeholders can possibly find answers to questions about the student body, or possibly find new quations to ask that they wouldn't have been able to answer prior to having this resource.

5 Discussion / Conclusion

<a id = '5'></a> Through the process of performing this research, we were successful in proving that the data the university is already collecting on its students is incredibly valuable for a smart campus type application.We set out to try and prove that we may be able to use this data rather than investing in extremely expensive technologies to perform the same task.

However, while we were succesful in provinng the usefulness of this data, our findings do have some limitations. For starters, it would be best if we could have access to a live stream of data, however, we are still waiting to hear back from the IT department to know if it is possible or if the access points only transmit the connection data at a set interval. Having access to a live strem would be very valuable for applications such as response to public safety concerns such as disasters of events such as school shootings. Being able to live heatmap the campus may give these first responders or public safety officers a better idea of what the situation is.

Additionally, working with devices other than strictly Apple devices is a clear next step. However, working to filter stationary devices while capturing devices which travel with students would be necessary. For an intial proof of concept, strictly working with apple is functional, however, being able to gain the extra insight provided by as much of the student body as possible would be helpful.

Lastly, a major limitation we had was that we did not have disconnect times in the data. This makes it impossible to know for sure if a student has left a building. This is an unfortunate limitation, however, through filtering data it is fairly easy to presume which students have left campus, assuming students aren't spending more than 9 hours in a classroom.

The next steps to this project are to create some very succinct dashboards which are specific to a desired use-case. We plan to work with departments at the university to create dashboards which can be helpful to thier needs and help usher the University of New Hampshire closer to being a smart campus.